Using pre & post-processing methods to improve binding site predictions
نویسندگان
چکیده
Currently the best algorithms for transcription factor binding site prediction within sequences of regulatory DNA are severely limited in accuracy. In this paper, we integrate 12 original binding site prediction algorithms, and use a ‘window’ of consecutive predictions in order to contextualise the neighbouring results. We combine either random selection or Tomek links under-sampling with SMOTE over-sampling techniques. In addition, we investigate the behaviour of four feature selection filtering methods: Bi-Normal Separation, Correlation Coefficients, F-Score and a cross entropy based algorithm. Finally, we remove some of the final predicted binding sites on the basis of their biological plausibility. The results show that we can generate a new prediction that significantly improves on the performance of any one of the individual algorithms.
منابع مشابه
Using sampling methods to improve binding site predictions
Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. In previous work we combine random selection under-sampling into SMOTE over-sampling technique, working with several classification algorithms from machine learning field to integrate binding site predictions. In this paper, we improve the classification result with the aid of Tomek ...
متن کاملAnalysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases
Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...
متن کاملIntegrating binding site predictions using meta classification methods
Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. There is good reason to believe that predictions from these different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks and support vector machines on predictions from key algorithms. Furthermore, w...
متن کاملImproving Computational Predictions of Cis- Regulatory Binding Sites
The location of cis-regulatory binding sites determine the connectivity of genetic regulatory networks and therefore constitute a natural focal point for research into the many biological systems controlled by such regulatory networks. Accurate computational prediction of these binding sites would facilitate research into a multitude of key areas, including embryonic development, evolution, pha...
متن کاملPredicting DNA-Binding Sites by Exploring the Distribution of Atom Groups around the Surface
DNA-binding proteins perform various functions in the cells. Determining the structures of protein-DNA complexes using experimental methods are hindered by many obstacles. Thus, computational methods for predicting DNAbinding sites on protein structures are needed to elucidate the mechanism of protein-DNA interactions. In this study, we divided atoms of amino acid residues into 14 groups and us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 42 شماره
صفحات -
تاریخ انتشار 2009